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Total motile sperm count and DNA fragmentation Index are two parameters 
in sperm analysis that have recently been used to determine the outcome of 
the management of cases of male infertility. Total Motile Sperm Count is 
one of the values considered better than the 2010 World Health Organization 
standard sperm analysis in terms of predictive value for the success of the 
spontaneous ongoing pregnancy rate. High DNA Fragmentation Index 
values were associated with lower pregnancy success and an increased risk 
of low fertilization rate or total fertilization failure. In this study, we 
developed a method to classify sperm analysis based on total motility sperm 
count and DNA Fragmentation Index values by using an electronic nose. In 
the total motility sperm count (TMSC) study, we use four algorithms with 
the result of accuracy values 95% and in the DNA fragmentation Index 
study, we get a fairly good accuracy value for two algorithms with the 
accuracy values 70%. 
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1. INTRODUCTION 

Artificial intelligence has now developed in helping sperm morphology analysis through image 
processing. The computer-assisted sperm analysis (CASA) can be processed in real-time. However, this CASA 
examination is quite expensive, and it has not been included in the 2010 WHO guideline for sperm analysis. 
Another semen examination method that has utilized artificial intelligence technology is calculating sperm 
concentration using machine learning-based spectroscopy [1]. Semen analysis is the main procedure in infertility 
examination to classify sperm samples as having normal or abnormal values. Furthermore, the sperm analysis will 
be used to select, track and collect healthy sperm for in vitro processing. One study applied artificial intelligence to 
microscopic images of sperm analysis results to simplify and speed up the process of classifying sperm cells using 
the faster region convolutional neural network (FRCNN) with elliptic scanning algorithm (ESA). This new method 
can detect sperm and identify sperm motility within 1.12 seconds with an accuracy of 97.37% [2]. 

Sperm morphology is a very important factor in the assessment of sperm quality. Various artificial 
intelligence methods have been developed to help classify sperm deformities based on datasets of sperm 
morphology. One of these studies applies multi-stage cascade-connected preprocessing and machine learning 
based on a non-linear support vector machine (SVM) kernel. The results of this study provide an accuracy of 
86.6% for the human sperm head morphology (HuSHeM) dataset and 85.7% for the Sperm Morphology 
image data set (SMIDS), respectively [3]. Another study developed an artificial intelligence algorithm using 
a network-based deep transfer learning approach and deep multi-task transfer learning (DMTL), to classify 
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sperm heads, vacuoles, and acrosomes as normal or abnormal. This technique can classify all parts of sperm 
and increase the accuracy of the head, acrosome, and vacuole by 84.00%, 80.66%, and 94.00% on the label [4]. 

Infertility in men is a health problem that is still common in the community. Various attempts to 
increase the outcome of infertility management have led to the parameters of the success rate of pregnancy in 
couples. Total motility sperm count (TMSC) is one of the values considered better than the 2010 WHO 
standard sperm analysis in terms of predictive value for the success of the spontaneous ongoing pregnancy 
rate (SOPR). The TMSC value was obtained from the multiplication of the volume of seminal fluid, sperm 
concentration, and the percentage of the motility values of spermatozoa. In addition, TMSC also plays a role 
in increasing the predictive value of pregnancy outcomes in couples undergoing intracytoplasmic sperm 
injection. A study showed the normal total motile sperm count group had a higher fertilization rate (p=0.016) 
and lower miscarriage rate (p=0.041) compared to the abnormal total motile sperm count group [5], [6]. 

A study that used the parameters of processed TMSC and 24-hour sperm survival as predictors, showed 
that the processed TMSC value >10x10° and the 24-hour sperm survival value >70% were good predictors for the 
success of IUI, while the processed TMSC value <5 x10° and the 24-hour sperm survival <30% is a poor predictor. 
However, the pattern of TMSC values from time to time is also quite important to know considering that 
increasing age and giving therapy in cases of infertility will also affect the trend of TMSC values that occur. In a 
meta-analysis examining the age relationship, TMSC scores and grouping of TMSC groups by year showed that 
the proportion of men with normozoospermic decreased, and the risk of needing infertility treatment in men 
increased over the study period. Although the study method used is retrospective, the shift in the group of men 
requiring clinical infertility treatment is quite relevant. Thus, it can be concluded that the TMSC value has an 
important role in the prognosis of infertility therapy. In contrast to the above study, another study showed that 
TMSC had no significant effect on IUI success in patients with unexplained infertility. In that study, the outcome 
parameters were live birth rate (LBR) and clinical pregnancy rate (CPR) [7]-[9]. 

DNA fragmentation index (DFI) is one of the most important parameters in assessing male 
infertility. DFI correlated with the level of infertility and levels of microelements in the serum such as zinc 
and magnesium. It is also known that high levels of reactive oxygen species (ROS) in semen plasma 
contribute to the formation of sperm DNA fragmentation. The study showed that DNA damage had an 
impact on the low motility of sperm cells. However, DFI is not related to conventional sperm analysis 
parameters. Another study using correlation analysis showed that DFI was positively correlated with age, 
abstinence period, semen volume, and abnormal morphology of spermatozoa cell heads meaning that the 
higher the number of these parameters, the higher the DFI value. On the other hand, DFI was negatively 
correlated with sperm concentration, sperm progressive motility, and percentage of normal sperm 
morphology. This is due to the results of studies showing an increase in sperm concentration, sperm motility, 
and sperm morphology after varicocelectomy, and has been shown to reduce DFI [10]-{12]. The role of the 
DFI examination as a means of predicting the outcome of IUI has recently grown worldwide, as evidenced by 
a meta-analysis and systematic review that a low DFI value in pregnancy after IUI has a relative risk value of 
3.15, with a sensitivity of 94% and specificity 19%. However, further research is still needed to assess the 
relevant and precise DFI cut-off value and the stability of the test results obtained from time to time [13]. 

The idea used in this study is to utilize metabolomic gases that form pattern recognition of volatile 
organic compounds produced from the smell of sperm fluid. The metabolomics study of semen can be used 
for the development of electronic nose technology for diagnostic support in infertility cases. The use of gas 
chromatography/mass spectrometry (GC-MS) technology for routine clinical use is considered quite 
expensive and time-consuming, so the use of electronic nose technology can be used as a cheaper and faster 
alternative. The electronic nose will interpret the smell of sperm fluid through a series of certain gas sensors 
to be further converted into electrical signals and analyzed with the help of artificial intelligence algorithms. 
The selection of feature extraction methods in electronic nose technology plays an important role in 
increasing accuracy. Several feature extraction methods are currently widely used for electronic nose 
development including window time slicing (WTS) moving window time slicing (MWTS), moving window 
function capture (MWFC), curve fitting parameters, transform domains, phase space (PS), and dynamic 
moments (DM), parallel factor analysis (PARAFAC), energy vector (EV), and power density spectrum 
(PSD). This study aims to develop electronic nose technology in detecting the characteristics of semen and 
classifying it based on the clinical cut-off value of DFI [14]-[16]. 


2. METHOD 

This study used semen samples from 98 male patients with suspected infertility at the fertility clinic 
of Sadewa Hospital, Yogyakarta, Indonesia. The sample is taken simultaneously from patients who will 
perform sperm analysis with the masturbation process at the fertility clinic after abstinence for 2-7 days. In 
addition, this study also took semen samples from patients with an indication of DNA fragmentation index 
(DFI) examination of 21 subjects. The sperm analysis method used is the conventional technique. The sperm 
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analysis was carried out at the laboratory of Sadewa Hospital, Yogyakarta. The remaining seminal fluid 
samples were stored in a refrigerator with closed containers. The process of measurement is carried out using 
the electronic nose (GeNose) tool in the Materials and Instrumentation Physics Laboratory of the Physics 
Department, Faculty of Mathematics and Natural Sciences, Gadjah Mada University. 

As shown in Figure 1, the examination was carried out by placing 0.3 cc of the sample using a 
pipette in a 100 cc beaker with a tight lid, then a hose connected to the e-nose was installed. The sampling 
process is carried out at room temperature, with a total period of the test is 60 seconds. The gas sensor array 
pattern is recorded using data logger software. We use several types of values for feature extraction which are 
then followed by a supervised classification algorithm. The greater the value of TMSC means the better the 
motility quality and the lower the value of DFI means the less damage to the DNA of spermatozoa cells. 


Figure 1. Method of examining semen samples using an electronic nose and beaker tube with a modified lid 


We divided the TMSC study subjects into three groups based on the TMSC value, namely the ‘less 
than 30’ group, ‘between 30-60’ and ‘more than 60’ groups. The division of the TMSC category was based 
on the reference of previous research which divides the TMSC value into 2 parts, namely ‘less than 20' and 
‘more than 20' to assess the success of 'spontaneous pregnancy rate’ after intra uterine insemination (IUD 
treatment [4]. We also divided the DFI study subjects into two groups based on the DFI value used as a cut- 
off, namely the ‘equal or less than 15’ group, "between 16-25' group, and the ‘equal or more than_25’ group. 
Based on a reference from previous studies which divided the DFI value into ‘less than 30' and 'more than 30! 
as parameters to assess the outcome of intracytoplasmic sperm injection (ICSI) treatment [17]. 

We use five kinds of feature extraction values at 10th to 60th seconds of data, namely: gradient, max, 
min, mean, median, std, trapz and var. Data preprocessing with minmax scaler on the dataset for data 
normalization. The stratified 3 fold and 5 fold cross validation method is used during the machine learning 
process with the distribution of 30% testing dataset and 70% training dataset. The machine learning algorithms 
used are random forest (RF), k-nearest neighbors (KNN), support vector classification (C=1, gamma= ‘scale’, 
kernel= 'rbf') and linear discriminat analysis (LDA). For data analysis, we use Python based software. 


3. RESULT AND DISCUSSION 

This research is a preliminary study to develop electronic nose technology in classifying several 
parameters resulting from sperm analysis such as total motile sperm count (TMSC) and DNA Fragmentation 
Index (DFI). These two parameters are often used in clinical practice to help predict the success of assisted 
reproductive technology. The characteristics of the TMSC research subjects in Table 1 show that the average 
age of the subjects of the preliminary study was 32.47 years. The length of storage of samples in the laboratory 
refrigerator is also quite varied with an average of 5.58 days. The abstinence period of the study subjects was 
also in accordance with the protocol for examining sperm analysis with an average abstinence period of 3.94 
days. The total volume of sperm fluid produced by the research subjects was quite varied with the average 
volume of semen being 3.16 cc. As it is known that research on volatile organic compounds in semen has been 
carried out several times using gas chromatography and mass spectrometry methods, however, the 
pathophysiological process that underlies a disease related to infertility is important to classify the results of 
these metabolomics studies [18]—[20]. The characteristics of the TMSC research subjects are as follows. 
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Table 1. Characteristics of TMSC research subjects (98 subjects) 
No Parameter Category Amount Average Value 
1 Ages < 30 years 29 

30 -34 years 35 
35-39 years 26 


40 - 44 years 7 32AT years 
45 - 49 years 0 
50 - 54 years 1 
2 Storage Time <3days 20 
3-6days 40 5.58 hari 
7-14days 38 
3 Abstinence Period <4days 40 
4-5days 45 3.94 hr 
6-7days 13 
4 Cement fluid volume <lee 2 
lsd2cc 46 
3s.d4cc 42 3.16 cc 
Ss.d6cc 7 
>7cc 1 
6 Total Motility Sperm Count >60 36 (36.7%) 
30-60 18 (18.3%) 50.95 


<30 44 (44.8%) 


The process of classifying semen for TMSC and DFI values is usually carried out by direct 
observation of several sperm analysis parameters under a microscope, which requires more time and 
accuracy. The development of electronic nose technology aims to further accelerate the classification process 
by recognizing the pattern of metabolomic gases detected from semen. Thus, it is hoped that it will facilitate 
and speed up the fertilization process using assisted reproductive technology, especially in the selection of 
spermatozoa cells to be used. Several important parameters play a very important role in assisted 
reproductive technology, including pregnancy and birth rates after in vitro fertilization (IVF) or 
intracytoplasmic sperm injection (ICSI), sperm genetic integrity, and fertilization capacity [21]-[23]. The 
characteristics of the DFI research subjects are shown in Table 2 and Table 3. 


Table 2. Characteristics of DFI research subjects Table 3. Category of DFI group based on sperm 
based on sperm analysis (21 subjects) analysis (21 subjects) 
No Parameter Average value No DFI category Amount 
1 Ages 34.14 years 1 equal or less than 15 9 (42.8%) 
2 Volume 3.2 cc 2 between 16-25 6 (28.5%) 
3 Viability 71.57% | 3 equal or more than 25 6 (28.5%) 
4 Concentration 26.54 million cells/cc 


The feature extraction of the mean, median, max, min, std, trapz and var values can be done using 
the dimensional reduction method using PCA with a total value of explained variance above 75%, as shown 
in Figure 2. In the TMSC study, we get a fairly good accuracy value for several algorithms (RF, KNN, SVM, 
LDA) both on training accuracy and on testing accuracy. The best results in this study showed an accuracy 
value of 0.95, precision 0.97 and recall 0.92 for the random forest algorithm, as shown in Figure 3. This 
result supports the theory which states that sperm quality is directly proportional to an increase in TMSC 
value and a low DFI value because a high TMSC value will increase the probability of fertilization by sperm 
with good motility and a low DFI value correlates with a decreased risk of recurrent spontaneous abortion 
[24]. TMSC values also correlated with live birth rates in infertile couples. Another randomized, prospective, 
and multicenter study investigated the association between the performance characteristics of the intrauterine 
insemination (IUI), post-processing TMSC method and the live birth rate in couples with unexplained 
infertility. The results of this study indicate that patient discomfort with the IUI method and low TMSC 
scores are associated with lower live birth rates. The preparation process for IUI and guidance with 
ultrasound during IUI were also not related to the success rate of live births. Thus, it can be concluded that 
the high TMSC value of the success rate of live births in unexplained infertile couples. A similar study 
examined the predictive value of TMSC for spontaneous pregnancy and pregnancy after IUI, suggesting that 
TMSC values >5x10° per ejaculation require treatment with IUI. This is also supported by the results that the 
TMSC value of 1-5x10° resulted in a lower spontaneous pregnancy rate compared to those who received IUI 
treatment [25]. 
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Figure 2. The visualization of 3D PCA (mean, median, max, min, std, trapz, var and grad) 
features extraction on TMSC dataset with a total value of explained variance above 75% 
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Figure 3. Comparison of the confusion matrix in the TMSC classification dataset using stratified 
5-fold cross-validation on several algorithms (RF, KNN, SVM and LDA) 
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The feature extraction of the mean, median, max, min, std, trapz and var values can be done using 
the dimensional reduction method using PCA with a total value of explained variance above 80%, as seen in 
Figure 4. In the DFI classification study, we get a fairly good accuracy value for several algorithms (RF and 
LDA) both on training accuracy and testing accuracy. The best results in this study showed an accuracy value 
of 0.71, precision 0.67 and recall 0.75 for the random forest algorithm, as shown in Figure 5. The DFI 
examination is important in predicting low outcomes or failure of in vitro fertilization (IVF). A retrospective 
study proved that for cases of infertility in men with mild to moderate asthenozoospermia, high DFI values 
were associated with lower pregnancy success and an increased risk of low fertilization rate (LFR) or total 
fertilization failure (TFF). Thus, the DFI value can be used as an outcome predictor in IVF action. Another 
benefit of the DFI examination is to identify risk factors for recurrent abortions that cannot be explained. A 
study describes the relationship between DFI values and the incidence of recurrent spontaneous abortion, 
where a high DFI value increases the cases of unexplained recurrent spontaneous abortion. In addition, other 
studies have shown that high DFI values are positively correlated with IL-8 levels and negatively correlated 
with spermatozoa cell vitality in cases of infertility patients with leukospermia. This illustrates that there is a 
relationship between leukospermia and the incidence of DNA damage in sperm cells, which is thought to be 
due to high levels of malondialdehyde (MDA) due to the oxidative stress process [26], [27]. 

However, DFI examination requires a high cost and takes a long time, so further research is needed 
that can provide the same good outcome as DFI examination with an efficient method. One of these efforts is 
to utilize the metabolomic study of volatile organic compounds produced by sperm fluid. A study of 
metabolomic markers of semen from patients with infertility due to oligozoospermia demonstrated the 
potential of several biomarkers that were positively correlated with the motility of spermatozoa cells. The 
study used nuclear magnetic resonance spectroscopy with multivariate statistical analysis. In addition, studies 
of the prostatosome related to infertility have also been carried out, where the prostatosome is an 
extracellular vesicle secreted by prostate gland cells during ejaculation which consists of various proteins and 
plays an important role in the field of fertility-related immunology [10], [19]. 
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Figure 4. The visualization of 3D PCA (mean, median, max, min, std, trapz, var and grad) features extraction 
on DFI dataset with a total value of explained variance above 85% 


This study has several weaknesses, including the absence of validation of the TMSC and DFI cut-off 
values set by WHO to serve as clinical guidelines, so that the grouping in this study is only based on the 
estimated value of previous studies [9]. In addition, the number of samples used in this study is still very 
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limited due to the limited time of the study and the high cost of the examination, so that more data is needed 
to improve the accuracy of the research results. The duration of storage of semen samples in this study varied 
in the refrigerator, so it did not rule out the possibility of changes in metabolomic gases due to the growth of 
contaminant bacteria. In addition, it is necessary to experiment with other feature extraction models to 
increase the accuracy obtained [28]-[30]. 
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Figure 5. Comparison of the confusion matrix in the DFI classification dataset using stratifed 3-fold cross- 
validation on several algorithms (RF, k-NN, SVM and LDA) 


4. CONCLUSION 

It can be concluded that the use of electronic nose technology to classify and predict the result of 
sperm analysis has good potential to be developed. The TMSC and DFI values, which are currently widely 
used in clinics to predict the outcome of infertility therapy with assisted reproductive technology, still need to 
be studied further using gas chromatography and mass spectrometry, especially in terms of metabolomic 
studies among several groups of values being compared. This is necessary to facilitate the application of 
electronic nose technology as a rapid diagnostic tool. In addition, further experimentation is still needed on 
the extraction model that can provide the best accuracy from the results of the GeNose dataset for varicocele 
cases. One of the suggestions is to analyze the test results of samples from fertile men and then compare them 
with test results on men with varicocele infertility and non-varicocele infertility. 
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